76 research outputs found

    Clustering Via Nonparametric Density Estimation: the R Package pdfCluster

    Get PDF
    The R package pdfCluster performs cluster analysis based on a nonparametric estimate of the density of the observed variables. After summarizing the main aspects of the methodology, we describe the features and the usage of the package, and finally illustrate its working with the aid of two datasets

    Nonparametric clustering for image segmentation

    Get PDF
    open1openMenardi, GiovannaMenardi, Giovann

    On the stability of performance measures over time.

    Get PDF
    Performance persistence is a relevant issue when evaluating the predictability of future results of managed portfolios. A related crucial aspect is the stability over time of the measure used to assess the performance, defined as the degree of association between the rankings of financial assets induced by the performance measure throughout subsequents periods. In this work a general class of possible criteria to measure stability is proposed. Then, the attention is focused on a specific index, whose asymptotic expected value and variance are derived under the null hypothesis of absence of stability. Furthermore, two statistical tests for evaluating the significance of stability are discussed. An application to a large set of US equity mutual funds shows that stability may remarkably vary, as the performance measure or the time widow width where it is computed change

    Double clustering for rating mutual funds

    Get PDF
    Due to the increasing proliferation of mutual funds, in-depth evaluationof the available products for portfolio selection purposes is a difficult task.Hence, classification schemes giving quick information about which fundsare worth to be monitored, are often provided. The aim of this work is toshow an application of clustering methods to the mutual funds historicaldata. Starting from the monthly time series of the Net Asset Values of aspecific style-based category, namely the Large Blend US mutual funds, weapply distance-based clustering methods twice on a set of return, risk andperformance measures: firstly, with the aim of reducing data dimension, andsecondly to cluster funds in homogeneous classes. The adopted procedureclaims the feature of producing a partition of funds that are readily inter-pretable from a financial point of view and it is further possible to rank theidentified groups, thus obtaining a rating of funds that turns out to accountfor different propensities toward the risk exposure

    Training and assessing classification rules with unbalanced data

    Get PDF
    The problem of modeling binary responses by using cross-sectional data has been addressed with a number of satisfying solutions that draw on both parametric and nonparametric methods. However, there exist many real situations where one of the two responses (usually the most interesting for the analysis) is rare. It has been largely reported that this class imbalance heavily compromises the process of learning, because the model tends to focus on the prevalent class and to ignore the rare events. However, not only the estimation of the classification model is affected by a skewed distribution of the classes, but also the evaluation of its accuracy is jeopardized, because the scarcity of data leads to poor estimates of the model’s accuracy. In this work, the effects of class imbalance on model training and model assessing are discussed. Moreover, a unified and systematic framework for dealing with both the problems is proposed, based on a smoothed bootstrap re-sampling technique

    Effect of training set selection when predicting defaulter SMEs with unbalanced data

    No full text
    We focus on credit scoring methods to separate defaulter small and medium enterprises from non-defaulter ones. In this framework, a typical problem occurs because the proportion of defaulter firms is very close to zero, leading to a class imbalance problem. Moreover, a form of bias may affect the classification. In fact, classification models are usually based on balance sheet items of large corporations which are not randomly selected. We investigate how different criteria of sample selection may affect the accuracy of the classification and how this problem is strongly related to the imbalance of the classes

    AUC-based gradient boosting for imbalanced classification

    Get PDF
    • …
    corecore